Episodic RL

https://gyazo.com/220c98468c78f67099ba91e46983f3a5

keep an explicit record of past events, and use this record directly as a point of reference in making new decisions.

compare an internal representation of the current situation with stored representations of past situations.

resembles ‘instance-’ or ‘exemplar-based’ theories of learning in psychology 31, 32.

episodic RL inherently assumes a kind of smoothness principle: similar states will generally call for similar actions. Rather than being learned, this inductive bias is wired into the structure of the learning system that defines episodic RL. In current AI parlance, this is a case of ‘architectural bias’ or ‘algorithmic bias’, in contradistinction to ‘learned bias’, as seen in meta-RL.

Reinforcement Learning, Fast and Slow